Carlina Feldmann
Lennart Oelschläger
Version of 19.03.2023
Welcome to this tiny course on data visualization in R with {ggplot2}! 👋
Potentially, plots can beautifully inform or horribly mislead. Colors and shape matter! ⚖️
The {ggplot2} package implements a grammar of graphics, a series of distinct tasks to make a graphic.
Being in decent control of {ggplot2} to produce meaningful plots.
Basic R skills + a not-too-old version of R (>= 2.10) + RStudio
Executing the following lines in R gives you access to the course material:
install.packages("remotes")
remotes::install_github("loelschlaeger/rcourse", upgrade = "never")
library("rcourse")To open a copy of these slides, type:
To start the practicals, type:
You can leave a note here on GitHub. 🙏
First we get {ggplot2}.
Next we need data, let’s go with an excerpt from the famous Gapminder dataset:
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
First, we tell the ggplot() function what data we use
and what variables we wish to see on each axis:
Something is missing … 🤔 We need an additional layer, a
geom_* function!
There are more of them which we can simply add (literally add!):
p <- ggplot(
data = gapminder,
mapping = aes(x = gdpPercap, y = lifeExp)
)
p <- p + geom_point() + geom_smooth()
pAs a last polishing step for now, we improve the x-axis scale and the plot labels.
p + scale_x_log10(labels = scales::dollar) +
labs(x = "GDP per capita",
y = "Life expectancy in years",
title = "Economic growth as an indicator for life expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapminder")Finally, we can use the ggsave() function to save our
plot:
ggplot()data = ...mapping = aes(...)geom_*() functionsOur goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.
This look odd, we forgot to group by country! 💡
But can you make sense of this mess? Luckily, we can additionally group by continents:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(aes(group = country)) +
facet_wrap(~continent)Better don’t facet_wrap(~country)… 🛑 Let’s polish our
plot with the things we already learned:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(color = "grey", aes(group = country)) +
geom_smooth() +
facet_wrap(~continent) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time on five continents")Notice that we supplied a formula to facet_wrap. This
can be more advanced, for example (with facet_grid):
ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race) +
labs(x = "Age",
y = "No. of children",
title = "Relationship between age and number of children",
subtitle = "Separated by sex (in rows) and race (in columns)")As a last input for this part, we learn four new geoms:
Using relative instead of absolute counts on the y-axis is covered in the tutorials.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 10 rows containing non-finite values (`stat_bin()`).
There is a message and a warning. We will adress both in the practicals.
library(dplyr)
ggplot(data = filter(gapminder, year == 2007),
mapping = aes(x = lifeExp)) +
geom_density()ggplot(data = filter(gapminder, year == 2007),
mapping = aes(x = pop,
y = reorder(continent, pop))) +
geom_boxplot() +
scale_x_log10() +
labs(y = NULL,
x = "Populations in 2007")We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.
R can work with geographical data, and {ggplot2} can produce choropleth maps.
world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black")
plot(p)Instead of the default Mercator projection, we can use the Albers projection:
Now in the tutorials, we will visualize the results of the Trump vs. Clinton election 2016 on a map of the US states.
Reproduce this plot! 😎
Don’t forget to install and load the packages {ggplot2} and {dplyr} and load the gapminder dataset. If you want to see some hints, scroll down this page.
Hint 1: Use your {dplyr} knowledge to create an extract of the gapminder dataset that only contains values from 2007.
Hint 2: Have a look at the 3rd slide of this presentation to copy the basic syntax and remember how to modify the labels.
Hint 3: You can set the size and colour of the points to depend on
certain variables in the aesthetics aes().
Hint 4: Have a look at ?guide to modify the legends.
{ggplot2} itself does not allow for interactive or animated visualizations. However, there are (of course) packages to achieve this, e.g. {plotly}, {gganimate}, {shiny}.
plot <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, size = pop, colour = continent)) +
geom_point(alpha = 0.7) +
scale_x_log10(labels = scales::dollar) +
guides(size="none") +
guides(colour=guide_legend(title="")) +
labs(
x = "GDP per capita",
y = "Life expectancy in years",
title = "Economic growth as an indicator for life expectancy",
caption = "Source: Gapminder"
)